Introduction to SynthesisFilters.jl

This notebook demonstrates how SyntheisFilters.jl works. Below we provide synthesized audio examples (Japanese) so that you are able to compare synthesis filters on your browser. Please read on.

In this notebook, the following synthesis fileters are demonstrated.

  • LMADF: Log magnitude approximation digital filter for synthesis from cepstrum
  • MLSADF: Mel-log spectrum approximation digital filter for synthesis from mel-cepstrum
  • MGLSADF: Mel generalized-log spectrum approximation digital filter for synthesis from mel-generalized cepstrum
  • AllPoleDF: All-pole digital filter for synthesis from LPC
  • AllPoleLatticeDF: All-pole lattice digital filter for synthesis from PARCOR
  • LSPDF: LSP digital filter for synthesis from LSP
  • MelGeneralizedCepstrums.jl: provides spectral parameter estimation based on mel-generalized cepstrum analysis.
  • SPTK.jl: a thin wrapepr for SPTK
  • WORLD.jl: a high-quality speech analysis, modification and synthesis system

In [1]:
using PyCall
matplotlib = pyimport("matplotlib")
PyDict(matplotlib["rcParams"])["figure.figsize"] = (12, 5)
using PyPlot


WARNING: using PyPlot.matplotlib in module Main conflicts with an existing identifier.

In [2]:
# https://gist.github.com/jfsantos/a39ed69a7894876f1e04#file-audiodisplay-jl
# Thanks, @jfsantos
include("AudioDisplay.jl")


Out[2]:
inline_audioplayer (generic function with 2 methods)

In [3]:
using WAV
using DSP
using MelGeneralizedCepstrums # to esimate spectral envelope parameters
using SynthesisFilters

In [4]:
# plotting utilities
function wavplot(x; label="a waveform", x_label="sample")
    plot(1:endof(x), x, "b", label=label)
    xlim(1, endof(x))
    xlabel(x_label)
    legend()
end

function wavcompare(x, y; label="synthesized waveform", x_label="sample")
    plot(1:endof(y), y, "r-+", label=label)
    plot(1:endof(x), x, label="original speech signal")
    xlim(1, endof(x))
    xlabel(x_label)
    legend()
end


Out[4]:
wavcompare (generic function with 1 method)

Data

In this notebook, we use the follwoing audio data to analyze and re-synthesize. Let's see and listen the example.


In [5]:
x, fs = wavread(joinpath(dirname(@__FILE__), "data", "test16k.wav"), format="native")
x = convert(Vector{Float64}, vec(x))
fs = convert(Int, fs)
wavplot(x)
inline_audioplayer(map(Int16, x), fs)


Speech parameter extraction

To syntheisze a wavefrom, basically you need two speech parameters:

  • excitation signal
  • spectral parameters (e.g. Mel-cepstrum)

MelGenralizedCepstrums.jl supports extracting lots of spectral parameters.

Excitation signal

In this notebook, we use a pre-extracted excitation signal, for test16k.wav in the example directory.


In [6]:
# Note about excitation
# fs: 16000
# frame period: 5.0 ms
# F0 analysis: esimated by WORLD.dio and WORLD.stonemask
# Excitation genereration: perioic pulse for voiced segments and gaussian random
# values for un-voiced segments
base_excitation = vec(readdlm(joinpath(dirname(@__FILE__), "data", "test16k_excitation.txt")))
wavplot(base_excitation)
inline_audioplayer(base_excitation ./ maximum(base_excitation), fs)


Split audio signal into overlapping time frames and apply windowing

This ia a basic step before mel-genrealized cesptrum analysiis. Note that windowing is essential for mel-generalized cepstrum analysis.


In [7]:
framelen = 512
hopsize = 80 # 5.0 ms for fs 16000
noverlap = framelen - hopsize

# Note that mgcep analysis basically assumes power-normalized window so that Σₙ w(n)² = 1
win = DSP.blackman(framelen) ./ sqrt(sumabs2(DSP.blackman(framelen)))
@assert isapprox(sumabs2(win), 1.0)

# create windowed signal matrix that each column represents a windowed time slice
as = arraysplit(x, framelen, noverlap)
xw = Array(Float64, framelen, length(as))
for t=1:length(as)
    xw[:,t] = as[t]
end

# col-wise windowing
xw .*= win;
@show size(xw)


size(xw) = (512,753)
Out[7]:
(512,753)

Spectral parameter estimation

You can extact lots of spectral parameters using MelGenrealizedCepstrums.jl. In the follwoing example, we extract mel-cepstrum from the windowed signal and then show the spectral envelope estimte.


In [8]:
c = estimate(MelCepstrum(20, mcepalpha(fs)), xw)
imshow(c, origin="lower", aspect="auto")
colorbar()


Out[8]:
PyObject <matplotlib.colorbar.Colorbar instance at 0x7fcc0f712998>


In [9]:
# Let's see spectral envelope estimate
imshow(real(mgc2sp(c, framelen)), origin="lower", aspect="auto")
colorbar()


Out[9]:
PyObject <matplotlib.colorbar.Colorbar instance at 0x7fcc0e247368>

Compare synthesis filters

Let's compare syntheiszed waveform with various synthesis filters.

Synthesis from Cepstrum


In [10]:
c = estimate(LinearCepstrum(25), xw)
y = synthesis(base_excitation, c, hopsize)
wavcompare(x, y, label="Cepstrum-based synthesized waveform")
inline_audioplayer(round(Int16, clamp(y, typemin(Int16), typemax(Int16))), fs)


Synthesis from Mel-Cepstrum


In [11]:
c = estimate(MelCepstrum(25, mcepalpha(fs)), xw)
y = synthesis(base_excitation, c, hopsize)
wavcompare(x, y, label="Mel-cepstrum-based synthesized waveform")
inline_audioplayer(round(Int16, clamp(y, typemin(Int16), typemax(Int16))), fs)


Synthesis from Mel-generalized cepstrum


In [12]:
c = estimate(MelGeneralizedCepstrum(25, mcepalpha(fs), -1/4), xw)
y = synthesis(base_excitation, c, hopsize)
wavcompare(x, y, label="Mel-generalized cepstrum based synthesized waveform")
inline_audioplayer(round(Int16, clamp(y, typemin(Int16), typemax(Int16))), fs)


Synthesis from LPC


In [13]:
l = estimate(LinearPredictionCoef(25), xw, use_mgcep=true)
y = synthesis(base_excitation, l, hopsize)
wavcompare(x, y, label="LPC-based synthesized waveform")
inline_audioplayer(round(Int16, clamp(y, typemin(Int16), typemax(Int16))), fs)


Synthesis from PARCOR


In [14]:
l = lpc2par(estimate(LinearPredictionCoef(25), xw))
y = synthesis(base_excitation, l, hopsize)
wavcompare(x, y, label="PARCOR-based synthesized waveform")
inline_audioplayer(round(Int16, clamp(y, typemin(Int16), typemax(Int16))), fs)


Synthesis from LSP


In [15]:
l = lpc2lsp(estimate(LinearPredictionCoef(15), xw))
y = synthesis(base_excitation, l, hopsize)
wavcompare(x, y, label="LSP-based synthesized waveform")
inline_audioplayer(round(Int16, clamp(y, typemin(Int16), typemax(Int16))), fs)